Goto

Collaborating Authors

 population matching discrepancy and application


Population Matching Discrepancy and Applications in Deep Learning

Neural Information Processing Systems

A differentiable estimation of the distance between two distributions based on samples is important for many deep learning tasks. One such estimation is maximum mean discrepancy (MMD). However, MMD suffers from its sensitive kernel bandwidth hyper-parameter, weak gradients, and large mini-batch size when used as a training objective. In this paper, we propose population matching discrepancy (PMD) for estimating the distribution distance based on samples, as well as an algorithm to learn the parameters of the distributions using PMD as an objective. PMD is defined as the minimum weight matching of sample populations from each distribution, and we prove that PMD is a strongly consistent estimator of the first Wasserstein metric. We apply PMD to two deep learning tasks, domain adaptation and generative modeling. Empirical results demonstrate that PMD overcomes the aforementioned drawbacks of MMD, and outperforms MMD on both tasks in terms of the performance as well as the convergence speed.


Reviews: Population Matching Discrepancy and Applications in Deep Learning

Neural Information Processing Systems

This paper presents the population matching discrepancy (PMD) as a better alternative to MMD for distribution matching applications. It is shown that PMD is a sampled version of Wasserstein metric or earth mover's distance, and it has a few advantages over MMD, most notably stronger gradients and the applicability of smaller mini-batch sizes, and fewer hyperparameters. For training generative models at least, the MMD metric does suffer from weak gradients and the requirement of large mini-batches, the proposals in this paper therefore provides a nice solution to both of these problems. The small mini-batch claim is verified quite nicely in the empirical results. The verification of the stronger gradients claim is less satisfactory, since the MMD metric depends on the scale parameter sigma, it is essential to consider either the best sigma or a range of sigmas when making such a claim. In terms of having fewer hyper-parameters, I feel this claim is less well-supported, because PMD depends on a distance metric, and this distance metric might contain extra hyperparameters as well as in the MMD case.


Population Matching Discrepancy and Applications in Deep Learning

Chen, Jianfei, LI, Chongxuan, Ru, Yizhong, Zhu, Jun

Neural Information Processing Systems

A differentiable estimation of the distance between two distributions based on samples is important for many deep learning tasks. One such estimation is maximum mean discrepancy (MMD). However, MMD suffers from its sensitive kernel bandwidth hyper-parameter, weak gradients, and large mini-batch size when used as a training objective. In this paper, we propose population matching discrepancy (PMD) for estimating the distribution distance based on samples, as well as an algorithm to learn the parameters of the distributions using PMD as an objective. PMD is defined as the minimum weight matching of sample populations from each distribution, and we prove that PMD is a strongly consistent estimator of the first Wasserstein metric.